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A MULTILEVEL MONTE CARLO METHOD FOR A CLASS OF 
MCKEAN-VLASOV PROCESSES 

L.F. RICKETSON 


Abstract. We generalize the multilevel Monte Carlo (MLMC) method of Giles to the 
simulation of systems of particles that interact via a mean field. When the number of 
particles is large, these systems are described by a McKean-Vlasov process - a stochastic 
differential equation (SDE) whose coefficients depend on expectations of the solution as 
well as pathwise data. In contrast to standard MLMC, the new method uses mean field 
estimates at coarse levels to inform the fine level computations. Using techniques from 
the theory of propagation of chaos, we prove convergence and complexity results for the 
algorithm in a special case. We find that the new method achieves L 1 error of size e with 
0(e _2 (loge) 5 ) complexity, in contrast to the 0(e -3 ) complexity of standard methods. 
We also prove a variance scaling result that strongly suggests similar performance of the 
algorithm in a more general context. We present numerical examples from applications 
and observe the expected behavior in each case. 


1. Introduction 


The multilevel Monte Carlo (MLMC) method introduced in [1TJ has proven to be 
a powerful tool for simulation of stochastic differential equations (SDEs) and related 
models. The method and its variants have found applications in finance [5, Tj IS] [TSj (to 
name a few), biochemical kinetics P 2], plasma physics (37, 38], and porous media flow 
pH m, E2 [33j, among others. 

In short, the method is applied to a stochastic differential equation (SDE) 


dX t — a(X t , t)dt + b(X t ,t)dW t (1) 

by introducing a hierarchy of time step “levels” A ti oc and taking advantage of cor¬ 
relations between simulations at adjacent levels to achieve variance reduction. Typically, 
the goal is to compute the expectation of some functional of the SDE’s solution. Standard 
MLMC achieves this with root-mean-square error £ in 0(e _2 (loge) 2 ) time, a dramatic 
improvement over the 0(e~ 3 ) time required by naive Monte Carlo. 

However, the modeling power of SDEs - and all the models used in the references above 
- is limited by the absence of interaction between distinct realizations. In a wide array of 
applications, a large number of particles or agents are not only subject to some external 
stochastic forcing, but also interactions amongst themselves. Examples of this behavior 
abound in physics, including kinetic descriptions of plasmas and rarefied gases, as well as 
ferromagnets. In economics and social sciences, many realistic models must recognize that 
individual agents are influenced by the actions of others [TOj, 33] . Analogous statements 
hold true for various systems in biology jSU [391 EH HU HZ]- 

In many cases, these interactions are mediated by one or more mean fields - that is, 
ensemble averages over all the particles. A quite general form for this type of interaction 
is captured in the system of SDEs 


N„ 


dX l t = a 


x;, R ( x >) ) dt + P 


3 = 1 



R(X’) dWj 


( 2 ) 


3 = 1 
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Here, X' t 6 M d is the i th particle trajectory at time t, the are independent D- 
dimensional standard Brownian motions, N p is the total number of particles, R : —> M 7 

governs the mean-field interaction, a : x M 7 —* M d , and P : x M 7 —» M. dxD . 


For any fixed N p , this is a system of N p d coupled SDEs for the XL i — 1,2 


,N p . 


In applications, particularly those arising from physics, N p is exceedingly large - N p « 
10 23 is not unusual. This makes storage and evolution of even a single sample of the 
N p d dimensional system state computationally intractable, much less the many samples 
required for a Monte Carlo simulation. Intuitively, one would like to view each of the XI as 
a d-dimensional sample from a single overarching stochastic process, thereby dramatically 
reducing the dimensionality of the problem^ 

The mathematical justification for this intuition was developed by McKean 
who showed that in the limit N p —> oo, the evolution of each X\ converges to 


dX t = a (X t , t, Ei2(X t )) dt + /3 (X t , t, EtfpQ) dW t (3) 

in the strong sense * 2 at rate A r p ~ 1 ' /2 , subject to reasonable assumptions on a, /3, and R. 
It is important to note that X\ and X{ satisfying ([ 3 ]) are independent for i 7 ^ j when 
independent Brownian motions are used, while this is not the case for distinct particles 
in ([2]). This result and others of a similar character have come to be called “propagation 
of chaos” theorems - see 021 for a review. 

The process ([3]) is an instance of a McKean-Vlasov process - a term which has come to 
encompass any SDE whose coefficients depend on the probability density of the solution as 
well as its path-wise behavior. In the same way that the probability density of an SDE’s 
solution solves the forward Kolmogorov (i.e. Fokker-Planck) equation, the probability 
density p(x, t ) corresponding to (|3| solves the following nonlinear, nonlocal PDE 


d t p- 1 - V • {a ( x,t,R)p } = - 


d 2 


2 dxidxk 


{P ij (x,t,R)p kj (x,t,R)p} , 


(4) 


where 

R -= f R{y)p{y,t)dy (5) 

J R d 

and summation over repeated indices is implied. 

Any Monte Carlo method for (J3]) is thus expected also to be a valuable numerical 
approach for PDEs of type 0-0 in high dimension. Examples include the Vlasov and 
Landau-Fokker-Planck equations governing kinetic plasma dynamics, and Fokker-Planck 
approximations of the Boltzmann equation, each of which have d = 6 . 

Our new multilevel method extends the computational gains MLMC affords for SDEs 
to McKean-Vlasov processes of type (J3|. In particular, only a logarithmic increase in 
complexity is seen relative to standard MLMC. I 11 contrast to previous methods, the 
approach requires coupling each level in the scheme to all lower levels. This enables low 
variance estimates of the mean field at high levels, even when few particles are sampled. 
This coupling does complicate analysis of the scheme, since distinct samples and levels 
are not independent. This is overcome through use of propagation of chaos techniques. 
Some subtleties in implementation are also introduced, but these only necessitate small 
algorithmic changes. 


3 This is closely related to the “molecular chaos” assumption used in deriving the Botlzmann equation 
from the BBGKY hierarchy. 

2 This is a special case of McKean’s result. In general, the pathwise and mean field dependencies need 
not be factorable. For example, dX = Ex' [(1 + (X — X') 2 ) -1 ] dt + dW, where X' is iid relative to X, 
is a McKean-Vlasov equation to which McKean’s theorem applies, but cannot be written in the form (|3|. 
However, the factorization greatly simplifies application of MLMC and is present in many applications. 
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The remainder of the paper is structured as follows. Section 2 reviews the requisite 
background material, including standard MLMC for SDEs and single-level schemes for 
McKean-Vlasov processes. Section 3 describes the motivation and intuition behind the 
multilevel scheme for McKean-Vlasov processes, then outlines the algorithm. Section 4 
develops a partial theory of the complexity and convergence of the algorithm. Section 5 
presents numerical examples, in which convergence is observed in settings more general 
than our theory requires. We conclude in section 6. Some of the more technical proofs 
are confined to appendices. 


2. Background 

2.1. MLMC for SDEs. In its original form ra,. MLMC is a numerical technique for 
solving the following problem: If X t 6 satisfies |l]) with X 0 known, estimate 

P ~ E |P(X,)]. (6) 

Here, P : x [0, T\ —> M is some pre-specified functional of the solution X t . In finance, 

P is frequently called the ‘payoff function’ - we will use this terminology here, even when 
discussing other applications. 

MLMC improves on the standard Monte Carlo approach, which is to discretize in time, 
usually with the Euler-Maruyama scheme 

X n +\ = X n + a( X n , t n )At + b(X n , t n )AW n , (7) 

where X n « X tnl t n = nAt, and the AW n are independent 1V(0, At) random variables. It 
is well known that this scheme has weak order 1 and strong order 1/2 [23]. One generates 
N independent samples of the solution - indexed by i - and estimates 

1 N 

P “ N E P Xn) • (8) 

S=1 

This method - we will call it ‘standard Monte Carlo’ - has time-stepping error O(At) 
and sampling error 0(l/\^N)- The computational complexity k is proportional to the 
total number of time steps taken, and thus scales as 

K ~Xt =0 ’ (9) 

where £ is the root-mean-square error in the estimate of P. In general, a scheme with 
weak order q will have k ~ e~^ 2+l A) . 

MLMC improves upon this using an iterated control variate strategy. In the simplest 
version, one introduces a hierarchy of time-steps Ate = Ato2~ £ , for t = 0,1,...,L, and 
uses the level i — 1 process as a control variate for the level £ process. More concretely, 
if X‘ is the process at level f, one wishes to approximate E[P(A" L )]. The identity 

E [P (X L )] = E [P (X 0 )] + ^ E [P (X e ) - P (X*- 1 )] (10) 

i =i 

holds trivially. Denoting the variance of the £ th term on the right side by Ve, strong 
convergence at rate r implies that Ve rs j Atf when X 1 and X e_1 are sampled using the 
same underlying Brownian path. 

One can derive the number number of samples necessary to achieve the desired accuracy 
e with minimal complexity. We quote the result without proof, and refer the reader 
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interested in more depth to m 


N e = 


— \/V £ At c ^T 

m=0 



(ii) 


This choice of N £ bounds the mean squared sampling error by e 2 /2. One then begins 
with L = 1 and increments it until 

L i =1 

thereby placing the same (approximate) bound on the mean-squared time-stepping erroi0 
The total complexity of the method is 


£ 2 

- 2 ’ 


( 12 ) 


hi ~ 


L 


E 


Ne_ 

At e 




£ 2 (loge) 2 r = 1/2 
£~ 2 r > 1/2 


(13) 


Thus, MLMC scales better than standard Monte Carlo no matter what discretization is 
used for each, so long as a strong converge rate of at least 1/2 is maintained. 


2.2. Existing Monte Carlo schemes for McKean-Vlasov processes. Numerical 
solution of (j3]) is complicated by the fact that we do not have advance knowledge of K[R\. 
We are thus forced to approximate it by a sample mean - i.e. to sample N particles and 
evolve according to 


dX\ = a 


1 N 

3 = 1 


dt + j3 


1 N 

x;,t-j2 R ( x ’) 

3 = 1 


dWl 


(14) 


Even though (14) is formally identical to ([2]) with N p replaced by N, McKean’s re¬ 
sults ensure we are justified in regarding each X\ as an approximate sample from the 
d-dimensional probability density corresponding to (|3j), rather than the vector of all the 
X\ being a single sample of an Ad-dimensional random variable. As such, we may regard 


N 


p -=-,yE p X) 


(15) 


2=1 


as an estimate of E[P(W)] under the evolution ([3]). 

Of course, we generally must introduce a time-discretization for (14), so in practice we 
evolve 


N 


N 


X'n +1 = X'n + a ( Xl t„ - Y. R(Xi) Ai + /3 X' n , ^ fl(Xj) ] A (16) 

3 =1 / V 1=1 


This scheme was introduced and partially analyzed in m j35j . This work concerned 
strong convergence, and we will make use of its results. In [8, 7j, weak convergence was 
discussed, and 3| builds on that work to show that the cumulative distribution function 
for samples of (lij approximates that of (J3]) to order At + 1 /y/N in the L 1 norm. This 
result, as well as those in [HUH, is limited to d — 1, where it roughly corresponds to a 
weak convergence rate. In arbitrary dimension, the overall complexity of the algorithm 
is either 0(e~ 3 ) - if O(At) weak convergence holds generally - or 0(e -4 ) if not. 


3 This is a simpler criterion than the more conservative approach used in m- Our scheme is not 
substantially changed by this choice, and we find this simpler criterion sufficient for convergence in all 
our numerical experiments. 
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The dimensionality restriction in existing weak convergence results means they are not 
useful here. However, for MLMC, the weak convergence rate is of little importance to 
the scheme’s efficiency. Indeed, (13) shows that improved weak convergence rates may 
only affect the constants in the complexity, not the scaling itself. As a result, we focus 
only on strong convergence rate, which implies weak convergence at the same rate. 


3. Description of Multilevel Algorithm 


The central result of this paper is a multilevel Monte Carlo algorithm for fast solution 
of the following problem: given that X t G satisfies 

dX t = a(X t ,t,ER(X t ))dt + b(X t ,t,ER(X t ))dW t , X 0 = £ (17) 

for t G [0, T], with £ a random variable with some known distribution, estimate 

P(t) := E |P(X,)]. (18) 

As before, P : —>■ W is a pre-specified functional. In the following, we will omit the 

explicit t dependence of a and (3 for brevity. 

Aside from the presence of the mean field, note that we have changed our notion of 
payoff function to allow vector-valued output. This does not complicate the analysis, but 
is useful for applications. When discussing variances of vector-valued functions, we will 
denote 

Var[P(X)] = max Var [P fc (X )]. (19) 

k 

In this way, we ensure that the required error tolerance is met in every component of P. 


3.1. Algorithm Description. We begin by establishing some notation. Let Ate = 
At 0 2~ ( , where l — 0,1,..., L. X ^ 1 , the i th sample of the approximate solution using time- 
step Ate a t time t £ n := nAte- The multilevel approximations of E [R] and E[P] at level £, 
time t l n , R £ n and P £ , respectively. We abbreviate R(XfP) to R^ and similarly for P £, \ 
We extend these definitions to non-integer multiples of Ate by linear interpolation. That 


is, 


( 20 ) 


tf. = (»-kl)flf. 1 + (i-<<+L»J)4j 

for any s G [0, T/Ate], an d similarly for Pf, R 1 /, and so forth. 

Because in this method we require estimates of E[P] at every time step of every level, 
writing a direct analogue of the telescoping sum (10) is notationally cumbersome. Instead, 


we describe an iterative procedure for moving from level £ — 1 to £. We must begin at 
£ = 0, where we use the standard single level scheme (16) with No samples. In our 
notation, 


XZi = xy + « X%\ R° n )Ato + P( x°’\ R° n A w n °’\ 


N 0 


r\\= y* r 

n N c ^ 


0,j 


( 21 ) 


3 =1 


Our level zero estimate of E[P] is just 


N 0 


po = _L P o,j 
n n n ' 


( 22 ) 


3 = 1 


To update our estimates from level £ — 1 to £, we set 





1 


(23) 
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where R O and P^ 3 are evaluations of R and P at a new process X^ 1,3 . 

The core insight of the algorithm is in the definition of X^ 3 . There are two requirements 
of this process. First, the updated estimates in (23) must have the correct expectations 
- = E [R^], and similarly for P £ - in order for the method to be accurate. Second, 

the correction terms - ( R^ 3 — R £ / 2 ) and similarly for P - must have small variance in 
order for the method to be efficient. 

To accomplish the first goal, the X O should be identically distributed to the Xfp 1,3 . 
To accomplish the second, they must be correlated with the X^ 3 . We achieve both using 
the following construction: Xq 1 = X/ 1 for all i, and 


X'nU = x n + a I X" K)At ,+,3 (A'“ R ’ n ) AW, 


r t.i 


r£ A 




rl,i 


V'ii = V'- 1 + a ( XJ\ fir 1 ) At,., + 13 ( X« fi 


r £,i 


r£ A 


-1 


r£ i 


e -i 


Alb 


£,i 


(24) 


Here, A W £,t = A W 2 £ + Alb^j is the coarsened version of the level l Brownian path, 
as in standard MLMC. 

Since they are each evolved using time-step Aiy_! and mean field R ^ 1 , X^ 1 and 
are identically distributed, as required. Further, use of the same initial data and under¬ 
lying Brownian path for X k ' 1 and X £ ’ 1 leads one to expect they are well correlated, as in 
standard MLMC. This expectation is confirmed in section 4. 

Note that none of the samples are independent, which complicates variance estimation. 
In this case, 


Var 



= Var 
+ 2Cov 


pt—i 

ri/2 


+ N, Var 


_ ph* 

r n n/2 


p£—l I p£,i _ p£>i 

r n/1 i l -'n/2 


N e - 1 

N f 


Cov 


n/2 


pi 
M n 



(25) 


and similarly for R f n . Because of the correlation between A A and X/j 2 , we expect the 
second variance on the right to be small. In the absence of the mean field, the covariance 
terms are identically zero, but that is not the case here since the different samples and 
levels are coupled to each other through the iFs. 

Fortunately, though, propagation of chaos results strongly suggest that the distinct 
samples and levels tend toward independence as the number of samples grows large. This 
is because the limiting process (J3| does have independent samples, and for large N our 
process closely approximates this limit. We therefore feel justified in approximating 


max Var 

n 



L 


E 



(26) 


where Vp := max n Var 


p£,% 


p£,z 

7i/2 


Note that for standard MLMC, this is an identity. 
This intuition will be made precise in a restricted version of the mean field case by our 
convergence theory. 


3.2. Algorithm Outline. The algorithm roughly follows that outlined in na, with a 
few additional subtleties introduced by the mean field. Let us briefly summarize the main 
differences and their origins. 

In executing standard MLMC, the following situation frequently arises: One has al¬ 
ready computed IV° ld samples at level t. Then, after incrementing L, the sum in (11) 
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grows, so that the total sample number needed at level i is N^ ew > Nf d . For SDEs, this 
is not problematic. All samples are independent, so one simply generates (A^ new — A° ld ) 
additional samples and uses them to update the relevant quantities. 

However, in the current context, the new samples will change the value of B n , so that 
we now have two sets of samples at level i which use slightly different mean fields. When 
we then endeavor to evolve X ( n +1 , it is unclear what value of the mean held to use in 
order to ensure that the X ,^ +1 and X ( n are identically distributed. 

Our algorithm attempts to avoid this difficulty by preventing Nf; from growing as L 
increases. We do this by tracking L est , a prediction of the final value of L , and the 
variances Ve we expect to encounter at higher levels that have yet to be sampled. In this 
way, once we sample level £, we are unlikely to need to resample it. 

We present an outline of the algorithm first, then discuss these subtleties in more depth 
as they pertain to each step in the outline. We define q = max n ||P,( — P^W <*>> an d the 
algorithm proceeds as follows: 

Algorithm 1 (Mean-Field Multilevel Monte Carlo). 

(1) Fix an error tolerance £ and set L — 1. 

(2) Choose an initial time step Ato and number of samples Nq and N{. 

(3) Compute P°, P°, and V 0 using (21)-(22) with the specified Af 0 and Nq samples. 

Then compute the analogous quantities at level 1 using (23)-(24) and N{ samples. 


(4) Estimate the number of necessary levels L est using 

L Cbt = f2 log 2 (ei/e) + 2] . 

(5) Compute N 0 and N\ using but with L replaced by L est , and assume 

A = A 

Ati A t \ 

for each 2 < t < L est . 

(6) If N 0 > N l 0 or Ni > N[, repeat steps (3)-(5) with Nq —> N 0 , N{ —> N±. 

(7) While 6l > s(l — l/\/2), iterate the following: 

(a) Increment L. 

(b) Set N l = N l _ i/2. 

(c) Compute P r [ J , R/ n , and V L using (23)-(24) with N L samples. 

(d) Set L est via 

pcbt — _£/+!+ \2 log 2 (e^/e:)] . 


(27) 


(28) 


(29) 


(e) Set Ni, i = L,...,L est according to (11), again replacing L with L est and 
assuming 

A = A ( 30) 

A t e A t L v ; 

for all t satisfying L < l < L est . 

(8) Return P%. 

The first three steps in the outline above are standard. In step 4, we estimate the total 
number of levels we expect to use, based only on knowledge of i. = 0,1. The estimates 
come from our strong convergence results, which imply 

||EP(A'')-EP(A' 4 )||«cA(; /2 (31) 

for some unknown c. Using what essentially amounts to Richardson extrapolation, it is 
straightforward to show that the smallest L satisfying 

||EP(A'')-EP(A'«,)|| < -V 


(32) 
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is approximately given by (27). A directly analogous computation gives (29). 

Step 7b) and equations (28) and (30) arise directly from our expectation that Vg scales 
like Atg - this is confirmed (up to a logarithmic factor) in the following section. Since 
Ng OC y/VgAtg, we expect each Ng to be half the size of the previous one, thus justifying 
7b). Moreover, one expects each term in the sum appearing in ([TTj) to be equal, so we 
use the last known term to estimate the remaining unknown terms. 


Inutitively, then, this algorithm yields approximations P„ of EP(AV) satisfying 


max ( E 


pi - ep(vj) 


1/2 


(33) 


with computational complexity comparable to that of standard MLMC methods. In 
the following section, we will see that, in fact, the complexity is increased by only a 
logarithmic factor as a result of the mean held. 


4. Convergence and Complexity Results 


4.1. The Linear Case. We prove convergence of the multilevel scheme for McKean- 
Vlasov processes of the form 

dX t = (AX t + PE [X t ]) dt + a(t)dW u (34) 

where A and P constant matrices. For d — 1 and certain choices of A and P, this is the 
Shimizu-Yamada model of muscle contraction [1611391110]. A model of this type has also 
been used to describe target leverage ratios in finance [26]. 

The satisfy 

Yii = X? + (AX‘j + BX‘f A t, + a(ti)AWi‘, (35) 


and 



Kjt + 


1 

Ne 



1=1 


(36) 


4.1.1. Strong Convergence Theory. Following standard propagation of chaos arguments, 
we define the new quantity X^ 1 by 

hfi = X? + ( AX.? + BE[X?])At, + a(t‘ n )AW?. (37) 

Simulation of X^ is impossible without advance knowledge of its expectation, but it has 
the important feature that its samples are independent. Intuitively, one expects that 
X(f —> XC as the Ng oo. The following theorem confirms this intuition and gives the 
rate of convergence: 


Theorem 1 (Convergence). Algorithm 1 applied to (34) has the following bound on each 
sample: 

. i 


max max E ||A" - AI|| < K' , -= 

r<t n <T/At r \ V^O 

for some positive constant K' that is independent of £. 
Proof. See appendix A. 


£ 

m= 1 


Ap 

Nrr. 


(38) 


□ 


Standard techniques show that the Xf l converge strongly to solutions of (34) at rate 
1/2 — 5 for any positive 6 - see e.g. proposition 3 in [33]. As an immediate result, we have 
the following corollary 
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Corollary 1. The Xfff converge strongly to solutions of (3f). In particular, if X l t is a 
solution of (34) using the same initial data and Brownian path as Xf l , then for any S > 0 


E 


X 


Li 


XI 


< K | At 1 / 2 - 5 + 




E 

m =1 


A t r 


N n 


(39) 


for some K that is independent of I. 


This completes our understanding of the strong convergence of the multilevel samples 
in this special case. This implies weak convergence at the same rate. 

4.1.2. Complexity Theory. An additional corollary to Theorem 1 which leads directly into 
our complexity analysis is as follows: 

Corollary 2. For any Lipschitz payoff function P and 5 > 0, 

t 


E 


Pt--EP(X«) 


< K | At 1 / 2 - 6 + 


I 


VN~o 


+ E 


m 


m= 1 


A t r 


N n 


(40) 


for some K independent of t. 
Proof. We first define 

V 1 = 


N t 

Ft + E \ p X‘) - PX%) 
1=1 


where the definition of Xf^ 1 is analogous to that of Xff. We have 


E 


pt-EP(Xp) 


< E 


pt _ pt 


+ E 


vi - EVi 


+ 


EP' - E P(X,,J 


(41) 


(42) 


We bound each of the three terms on the right in turn. First, Theorem 1 and summation 
up to I immediately imply 


E 


/A _ -p' 


< K 


I 


sTN„ 


E 

771=1 


m 


A t r 


N„ 


(43) 


Since EVf = E P(Xff) by construction, the independence and strong convergence of the 
X^ imply 


E 


vi - e Vi 


<K 'k 


E 

777=1 


AC 

N„ 


1/2 


E^-EP(AE) <KyfMt (44) 


Combining these three bounds and using the sub-additivity of the square root function 
gives the desired result. □ 


Theorem 2 (Complexity). For the MLMC algorithm (21)-(11) applied to (34), there 
exist choices of L and W depending on e such that 


E 


F - mx lt ) 


< £ 


(45) 


and the computational complexity n satisfies 

k = O (c~ J | loge| 5 ) . 


( 46 ) 


























































10 


L.F. RICKETSON 


Proof. Recalling Corollary 2, we simply choose L and Ng so that the time-stepping and 
sampling errors are each bounded by e/2. Namely, 


L = log 2 ( 2 Ky/Ato/e^j , 

N 0 = \AKL 2 (L + l) 2 e -2 ] , 

N t = \AKAt e L 2 (L + l) 2 e -2 ] for £ > 1. 


This clearly implies (45). Moreover, 


hi ~ 


L 


E 


Ne_ 

Ate 


e 2 L 3 (L + l) 2 ~ e 2 |loge| 5 , 


thus completing the proof. 


(47) 


(48) 

□ 


Evidently, the presence of the mean field only increases the complexity of MLMC by an 
additional logarithmic factor. It bears noting that this proof is not constructive - without 
advance knowledge of K, it gives no information about how to choose L and only tells us 
about the relative sizes of the Ng. The choices specified in 3.2 are motivated by analogy 
to the standard MLMC method, and are found to be effective in numerical experiments. 


4.2. Partial results toward a more general theory. Of interest is the extension to 
more general classes of McKean-Vlasov processes than (34). While a complete theory 
is work in progress, we present partial results. Our numerical results provide strong 
evidence for the convergence and complexity of the scheme in quite general contexts. 

We begin by reprinting for clarity the type of process we work with: 


dX t = a (X t , t, E R{X t )) dt +/3 (X t , t, E R{X t )) dW t , X 0 = £, (49) 

where £ is a random variable with some known density. A Erst desirable result is that X,f 
and X^ 2 should be nearby in a strong sense, so that the variance of the level differences 
is small. We work under the following mild assumptions: 


Assumption 1. Both a and (3 have uniform, Lipschitz bounds in each of their arguments. 


Assumption 2. Both a and (3 have uniformly bounded expectations. That is, for any 
time interval t e [0, T] and p > 1, there exists a constant K T , P > 0 such that 

E[||a(X,f,ER(X(f)))f] < K T}P (50) 

and similarly for f3, for every p 6 [1 ,p max ], where p max > 2. 


We then have the following theorem, which demonstrates that at the time steps shared 
by levels £ and £ — 1, the approximations at the two levels are close in the strong sense. 


Theorem 3. Under assumptions 1 and 2, the multilevel scheme for (49) satisfies 

O (y/AT e ) : p = 1 

= { O (Atg |log Atg\) : p = 2 

O (Atg) : 2 < p < p max 

for every even n satisfying n < T / Atg, and with the maximum taken only over even k. 







E 

max 

k<n 

v e 

A fc ~ A fc /2 

V 

- 





1 


(51) 


Proof. See appendix B. 


□ 


This is the bulk of the desired result for even n. The following corollary generalizes to 
odd n: 
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Corollary 3. Under assumptions 1 and 2, 

O (a/A Tg) 

= { O (Ate |log At,|) 
O (At e ) 


max E 

k<n 


y £ yi 
A fc _ A fc /2 


P — 1 
p = 2 

2 < p < Pr 


(52) 


for every n (even or odd) satisfying n <T/ Ate- 

Proof. For even n, this is a direct consequence of theorem 1. For odd n, we note that 


E 


yi yi 
A n — A n/2 


< KE 


yi VI 

A n— 1 A ( n — 


(n-l)/2 


+ ™[||aL 1 -q„_ 1)/2 |r]Atf 


(53) 


A'E 


Pn -1 P ( n — i )/2 


At 


p /2 


Since n is odd, n — 1 is even and theorem 1 provides a bound for the first norm on the 
right. The bounds on expectations of a and ft give uniform bounds on the second two 
norms. The result immediately follows by taking maximums. □ 

We thus have the following bound on the variance of the level differences. 

Corollary 4. For any Lipschitz function P, 

= O (At e \ log At,|) (54) 


max Var 

k<n 


pl pi 

Cc Cc/2 


for every n < T / Ate- 

Proof. By assumption 1 above, we have 


Var 

pi_ pi 

Cl Ci/2 

< E 

rsj 

1 yi +1 yi | 

| A n — A n/2| 

2 ' 

+ 

E 

P^ _ 

/ 2 



< E 

r^j 

W V! 

A n — A n/2 

2 " 

+ E 

v! y-1 

A n — A n/2 


(55) 


for n even, while for odd n, 
Var 


r n ■‘■n / 2 


< E 

v! VI 

A n — A n/2 

2" 

+ E 



_ , 

2 


r 

+ 

E 

pl _ pl 

Cl -'n/2 

+ 

E J 


X: 


n /2 


n /2 


yi 

A (n+l)/2 


(n+l)/2 


+ E 
E 


yi 

A n/2 


X, 


(n-l)/2 
2 


n/2 


(n-l)/2 


< E 


I v-m i 

| A n — A n/2| 


+ E[||.Y' +1 -A'' /2 ||] 2 + 0(Af ( ). 


(56) 


By corollary 3, we have 

Var [P(X l n +1 ) - P l n/2 ] = 0(Ate\ log At,|) + O (Ate) (57) 

for both even and odd n. Everything is independent of n, so maximizing over n and 
ignoring the smaller terms gives the desired result. □ 

If a scales like 

2 


A ~ — 

e 1 


£ 

,i=o 


A 

Ate 


(58) 


as in the standard MLMC case - see (|13j) - this variance scaling yields an algorithm with 
k ~ £ -2 (log£) 4 , in close approximation to the result of Theorem 2. 
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5. Numerical Results 


We conduct three types of numerical test. First, we apply our scheme to an equation of 
type (34) to confirm the convergence and complexity results we’ve proved in that context. 
Second, we consider a plane-rotator model of a ferromagnet with infinite interaction range. 
Models of this type have been explored by a variety of authors dSl EH |29l |30]. Even 
though this system does not satisfy the hypotheses of section 4.1, we observe the same 
convergence and complexity scalings. Third, we apply our method to the Vlasov-Poisson 
system, which has many applications in plasma physics. We again observe the predicted 
convergence and complexity behavior. 


5.1. Linear equation tests. We work in d — 1 with fixed a. That is, we solve 

dX t = (aX t + bE[X t ])dt + adW t . (59) 

It is straightforward to find exact solutions for the mean and variance of the solution. 
Specifically, by taking moments of the corresponding PDE and solving the resulting 
ODEs, we find 

E[X t ] = E[X 0 ]e (ffi+b)f , Var[W] = (var[X 0 ] + ^ e 2at - (60) 

These exact results are used for comparison to the results of our multilevel method. 

We set a = —1/2, b = 4/5, cr 2 = 1/2, and P(x) = x 2 . Since R(x) = x in this case, we 
can estimate the variance of X t L by 

Var[X, s ] (flf)\ (61) 

and compare to the exact solution in ( |60j ). We first examine the convergence and complex¬ 
ity of the method. These results appear in fig. [l] Here and in the proceeding convergence 
studies, error bars give estimates of the standard deviation of the ZP-error found by per¬ 
forming 20 independent computations for each e. Model complexities and computation 
times shown are averages over those same 20 computations. 




Figure 1. Convergence and complexity studies for the linear system, 
demonstrating the predicted behavior. 

We then check that the Vg scale as expected. These results are shown in fig. [2j where 
we find strong evidence for O(Atg) scaling - slightly better than our theory requires. 

Additionally, it is worth noting that in our computations, both here and in our sub¬ 
sequent tests, we find evidence for O(At) weak error, which is better than our strong 
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Figure 2. Variance of level differences in the linear model. The observed 
O(Ate) scaling is consistent with our analysis. 


convergence analysis requires. This is manifested in the fact that we require an addi¬ 
tional level when e is reduced by a factor of 2, not y/2 as one expects for 0(y/At) weak 
convergence. A rigorous proof of this improved weak convergence in arbitrary dimension 
is an interesting avenue for future research. 


5.2. Plane-rotator tests. We use the equation studied in [2T j: 

dX t = {KK [sin(A; - X t )] - sin JQ} dt + VXrdW t , (62) 


where X t E M, r is the background temperature, K a coupling constant, and E' indicates 
expectation over the primed variable. Physically, X t is the angle from some fixed axis of 
some oscillator - e.g. the magnetic moment of an atom or molecule. These are presumed 
to interact via the K¥! sin(A" f ' — X t ) term, and are subject to some external field or 
anisotropy aligned with the axis that leads to the — sinAj term. The stochastic term 
represents immersion in a heat bath at temperature r. 

A simple trigonometric identity makes our method directly applicable: 


dX t = {K (E [sin A 4 ] cos X t — E [cos X t \ sin X t ) — sin X t ) dt + VXrdW t . (63) 

We set K = 1, r = 1/8, P(x) = sin a;, A t 0 = T, generate samples of A" 0 from 
A r (7r/2, 37 t/4 ), and simulate to the terminal time T = 5. Our multilevel simulations 
are compared to the results of an over-resolved single level scheme of the type (16), using 
At = T/1024 and N = 5 x 10 7 . 

As above, we plot convergence and computation time data in fig. [3j We again observe 
the expected behavior. Additionally, we plot the V/ in fig. [4j We find results in agreement 
with the 0(A^| log A^|) predicted by corollary 4. 


5.3. Particle-in-cell tests. The Vlasov-Poisson system, 


d t f + v-V x f-V(j>-V v f = 0, 


-A0 = p - po 


f dv - po 


(64) 
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FIGURE 3. (Left) Convergence of the multilevel method for the plane- 
rotator system. (Right) Computational time and model complexity for 
plane-rotator system. 



A t e 

FIGURE 4. Variance of level differences in in the plane rotator model. 
The observed O(Ate) scaling is consistent with our analysis. 


for the phase space particle density f(x,v,t ) and electrostatic potential <f>(x,t), is an 
important equation in a variety of plasma physics applications - see e.g. 
among many others. In the simplest case, po is the constant po = f pdx. 

The particle-in-cell (PIC) method has long been the method of choice for this system. 
A full review of the method is beyond the scope of this paper - we refer the interested 


reader to [6]. To briefly summarize, we rewrite (64) in terms of particle trajectories: 


dx 

dt 


= v , 


dv „ 

* = - v ^- 


(65) 


For simplicity, we work in one space and one velocity dimension, with periodic boundary 
conditions. We introduce a grid in x-space with grid points x* and grid size h. We then 
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approximate 


p(xi) = / S(x — Xi)f dxdv = E f [<5(a: — £*)] ~ E f 


S 




h 


( 66 ) 


where S(x) is some approximation of the Dirac delta function. We use the most common 
choice in elementary PIC schemes: S(x) = max{l — |x|,0}. 

The p(xi) play the role of R, and the map p(xi) —» —V</>(xj) - using the FFT to 
compute (j) and its derivative - plays the role of a. We assume that V</ is continuous and 
piecewise linear to completely specify a. 

Evidently, /3 = 0 for this system. However, Coulomb collisions are typically introduced 


as a Fokker-Planck operator on the right side of (64), thus introducing non-zero (3 and 


making use of the full weight of the work presented here. This will be expanded upon in 
future work. 

In the numerical experiments presented here, we work in a dimensionless formulation 
with domain length C = 20, h = 1, terminal time T = 12, and A t 0 = 1/3. Initial particle 
positions are sampled from 1V(10,6) mod £, with the modulo present merely to ensure 
all particles reside in the computational domain. Initial particle velocities are sampled 
from 1V(0,1). We let the payoff function be the vector of the p(xi). That is, P — R in 
this case. 

Convergence and computation time are shown in fig. [5| As before, we compare to a 
highly accurate single-level (standard PIC) simulation when the analytic solution is not 
available. Here, we use 3.2 x 10 5 particles per cell and At = 10 -3 with h unchanged. 
We again observe the expected convergence and complexity scaling. Admittedly, the 




Figure 5. (Left) Convergence in density p. (Right) Computation time as 
a function of e, demonstrating the predicted scaling. 


factor of ~ 4 speed gain of the multilevel scheme in this context is not overly impressive. 
This is largely due to the CFL condition’s restriction on the size of Afo, which limits 
the number of levels one can explore using a serial code on a personal computer. On a 
more powerful machine, one could explore smaller £, where much larger speed increases 
are to be expected. Moreover, an interesting direction for future research is the use 
of the multilevel scheme in concert with recent developments in implicit PIC schemes 
HH H2J H31 EJ, which eliminate upper bounds on time-step. 

As before, we also confirm that Vi scales as predicted. In fig. [6j we again find excellent 
agreement with the prediction of corollary 4. 
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Figure 6 . Variance of level differences in 1D1V multilevel PIC scheme. 
The O(At) scaling agrees with analysis. 


6. Conclusions 

We have presented and analyzed a new multilevel Monte Carlo algorithm for simulation 
of McKean-Vlasov processes. In a particular case, convergence and complexity results 
have been established that demonstrate performance comparable to the standard MLMC 
method for SDEs. In addition, we have proved a variance scaling result that suggests 
similar performance in much greater generality. 

The extension of MLMC to McKean-Vlasov processes opens up a variety of applications 
to interacting particle systems. Numerical tests have been performed that explore a 
few of those potential applications, including plane-rotator models and kinetic plasma 
dynamics. In each case, the expected convergence and complexity are observed, even in 
cases in which our theory does not strictly apply. 

There are numerous directions in which this work could be extended. From a theoretical 
perspective, rigorous convergence and complexity results in more generality are of obvious 
interest. The results in section 4.2 represent a step in that direction, but are by no means 
complete. Higher-order time stepping schemes and their effect on the complexity and 
stability of the method are also areas for further investigation. Additionally, in this work 
we have assumed that the refinement factor At^/At^i = 2 for simplicity, but this choice 
of ratio has been found not to be optimal for standard MLMC im - for example, 4 is 
a popular choice. It would be interesting to investigate generalization of the method 
presented here to arbitrary refinement factor, and to study what factor(s) might be 
optimal in this case. 

In applications, plasma simulation is of particular interest to the author. As already 
mentioned, implicit time-stepping techniques used in conjunction with the multilevel ap¬ 
proach presented here are particularly promising for accelerating PIC schemes, since this 
would eliminate the upper bound on At 0 - One also wonders whether multilevel methods 
in both space and time can be used in concert to further accelerate the simulation, as in 
|21| . Each of these developments would also be a boon to simulations of Fokker-Planck 







MLMC MCKEAN-VLASOV 


17 


models for rarefied gas dynamics - e.g. [20]. Simulations in higher dimensions are of ob¬ 
vious interest, as are the incorporation of higher order time-stepping schemes and shape 
functions. Each of these is a topic of current research for the author. 
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Appendix A. Proof of Theorem f 

For brevity, we will omit the sample index i in this proof, since we always refer to the 
same sample. We note first that 

n— 1 

(K - X‘ n ) - (A'y, 1 - X^l) = At, V { A (Xi - X‘) + B(X‘ t - £[*„'])} 

k =0 
n/ 2-1 

— At, -i Y {a(x‘Y - x'f 1 ) + B(x[r l - eiaC 1 ])} 

fc =0 

(67) 


for even n, and something quite similar for odd n. 
Now, we note that 


“^(X’Y - x‘Y = £(*£-■ - W) + 


l-l - yi - 1\ 
0 ) 


k =0 


k =0 


n— 1 


/\tf 


At, j2Yt i - xt 1 )+Wv'y - x^) 


k =0 




t -\ _ y £- 1\ 
n/2 n/2 > 


( 68 ) 


because of the linear interpolation dehnitions. In addition, we have 

n— 1 n/2 —1 

At,Y,{xi - EK']} = Ai,_, V |a'-‘ - EH'- 1 ]} 


k =0 


k =0 
n-1 ^ N( 

Ai 'E V £ {(N - xt) - (X :‘ /2 - x£) 

k =0 £ i= 1 

I / J_1n / 


(K/i - E K/i l) - (V - eh'- 1 ]) 


Ti — l ( Ne 

At ‘J2 v D-h-N/JJ-E 

fc=0 l e i=l 


xt - X , 


^-1 

fc /2 


(69) 


Since we’ve already concluded that A^ 2 and X^ 2 are identically distributed, we can 
replace one by the other inside expectations. Dehning 5 = ( X— X*) — (Av^ ~ Tr^ 1 ), 
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we have 


71—1 


4 = A&t^l - ^r(K/2 - x%) + a«, V 


71—1 




/c=0 

71—1 


/c=0 




+ M 1 'EfE s 'i‘ + {<*& - E [</tl) - (•?«'■' - E«“‘])} , 

fc=0 £ i=l 

where the u e k , defined by 


( 70 ) 


N t 


12 1, = 


Ni 


Ew' - x ti) - e 


i= 1 


W W-l 
'W ^fc/2 


(71) 


are random variables with zero mean and variance 0(Ate)/N,\ by standard arguments. 

We take the cartesian norm and mean of ( |70| ) and, after taking advantage of the 
exchangeability of different samples, have 

71—1 


71—1 r 

E|KH<(|.4| + |fl|)Ai t J]E||,5'||+Ay 


+ 


A t. 


E 


fc =0 


yi-l _ yt -1 
^n/2 Wi/2 


Ate 

w" 


(72) 


+ E 


X 1 -} - E 

71/2 


yi -1 
dl /o 
71/2 


Importantly, this bound holds - albeit with slightly different constants - for all n, though 
we have only derived it for even n. Deriving its analogue for odd n is directly analogous, 
so we omit that computation here. 

Applying a discrete Gronwall inequality (see e.g. [5E] ) gives 


E|K||<(l+CTe' 


CT\ 


Ate Ate 

I< JL h- 


VN~e 


E 


yt-1 _ yt -1 
Wi/2 ^n/2 


+ E 


X 1 -,} - E 

71/2 


yi -1 
dl /o 
n /2 


where C := \A\ + \B\. 

With this in hand, we note that a trivial telescoping sum implies that 


rnaxE \\X e n — A^ll < maxE 11 — A°|| + maxE ||5; 

n II II „ / J rj. 


(73) 


(74) 


771—1 


Defining e l = max n E A^ — II, e* = max n E 


^-E[^] 


A < A 


yiVo 


E 

771 — 1 


At r 


£-i 


AG 


+ RE A ^ m+?m } 


and using (73), we have 

(75) 


771 — 1 


Furthermore, by the definition of X, we have 

e™ < F 1-1 + e m + e m_1 < e m + 2e m_1 + ... + 2e° 

We thus have 


(76) 


t'< K i -L + w A 

■■ vw 

< AI -L + w x 


€-1 


771—1 


+ A t m < e m + 


771—1 

*-l 


€ + Z, 6 

r =1 

771 


( 77 ) 


E 


771—1 


r=l 


since e° < Aq/a/Aq- 
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This clearly implies 


max e r < K \ + 'S~' m ] + 'Y mAt r 

r<< ~ \VN o tx 


max e . 

r<m 


Applying discrete Gronwall to this over t gives 


max e r < K _ 

r<e \ ^No 



1 + Y rnAt m exp E 3 At 


Bounding sums by their infinite analogues, which are convergent, we finally have 


max max E || A; - X^\\ < K' —= 

r<e n V vA 0 



Appendix B. Proof of Theorem 3 

We will do the proof in the autonomous case - i.e. neither a nor f3 has explicit depen¬ 
dence on t. The generalization to nonautonomy is straightforward, but the presentation 
is murkier clue to the additional terms. Throughout the proof, K will denote a generic 
constant which may depend on T and p, as well as the Lipshitz constant in assumption 
1 and the bounds on the norms in assumption 2. ft will not, however, depend on l or n. 
We note first that 

V' +2 = Xi + (of, + o' +1 ) At, + &AW‘ + /)' +1 A W‘ +I 

= X e n + of,A tt-x + AfAlCy + (c4 + , - a’ n ) At, + (P‘ n+1 - A)AW‘ h+ , * 
where we’ve suppressed the common i superscript. Subtracting the analogous expression 

for X(n+ 2)/2> we have 

*n+2 - *(n + 2)/2 = Y ~ *n/2 + {< ~ ^ /2 ) A t t _, + (/£ - ft /2 ) A 

+ («l + i - a^Ati + (f3* n+1 - p e n )AWY- 

Iterating, taking norms followed by maximums, followed by expectations, and defining 

r pn 

Snip) = E max m < n X e m - X; n/2 , we have 


S £ n+2 {p) < At£_,E max Y 

m<r) < 


E max Y (Pi ~ /3fc/ 2 )AIPfc /2 


Af^E max 

m<n 


[ a k+l a k) 


+ E max (/?Li - Pi)AW‘ +1 


We label the four expectations above I-IV and analyze them separately. Terms I and III 
have in common the absence of Brownian motion, so we investigate them first. 
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B.l. Bounding I. We have 


I < (n/2) p_1 E 


OLu ~ OL 


k even 


To bound the expectation inside the sum, we write 


EOK - ayf ] <k\e 


v e y- i 

^k~ ^k/2 


pi 


k/2 | 


+ E 


d£ td£— 1 

n k n k /2 


(84) 


(85) 


by virtue of a being Lipschitz in both arguments. By definition, 

Nt+1 


E 


Ri-R 


\i -1 
k/2 




Efyfyfy-fyvt 

j= 1 


j 

fe /2 


AT * 

< — > E 

3 =1 


Y^3 vT/ 

A fc A fc /2 


( 86 ) 


A fc _ A fc /2 


< A'E 

where the last line comes from the exchangeability of the samples. Substituting this into 


(85), we have 


E [IK +1 - 4/2If] < A'E [||.y +I - -4/2in . 


(87) 


Notice that this bound still holds when a is replaced by /3, since they satisfy all the same 
assumptions - this will be used in bounding II. Returning to I, we now have 


I < K(n/2) p ~ 1 E 


k even 


V £ Yi 
^ k ~ ^k/2 


PI 


< Kn-- 1 Y. 6 ‘M 


( 88 ) 


k even 


B.2. Bounding III. Next, we turn our attention to III. By again taking advantage of 
the fact that a is Lipschitz, we have 


III < (n/2) p-1 E 


E 

,k even 


a k +1 a 


£ IIP 


(89) 


<Kn--' Y, {E[|Ki-X'|n +E 


k even 


jd£ td£ 

JX k +1 n ‘k 


Note next that 

e oiai« - xrn < 2^‘e oK +i in Ai ? + ,+ ^-'e oi4 +i in a c 

< K(At r t + Ai£ /2 ), 


(90) 


where the second line follows from the bounded expectations of the coefficients (assump¬ 
tion 2). 


To attack the last norm in (89), we note that 

1 N e 

1 1 . ir 1 _ 


Rl + 1 = x ( Ri To + R 


k /2 ^ L (k+ 2)/2 J 


1 1 f 

+ yE 

£ 3=1 k 


.3 _ _ ( 64/ I ffyj 

fc+1 2 V V2 A - fl (n+2)/2 


(91) 
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so that we may write 

p 


E 


Dh _ JDl 

ri /c+1 ■H'ife 


< -E 
“ 2 

+ 2 P-1 E 


< -E 
“ 2 


pt—1 _ E>t—1 

XX (fc+2)/2 " n 'fe/2 


i?i +1 - Rl - - (Rin + R 


pt _ Sr 

■ n '(n+2)/2 rl n/2 


L fc/ 2 


l ( fc + 2)/2 


+ -Rfc/2 


(92) 


+ K \E [\\X e k+1 - X[\\ p ] + E 


^(n+2)/2 — ^n/2 


< -E 
“ 2 


Sr _ Sr 

- fl (n+2)/2 rl n/2 


+ K \ At p + At p / 2 1 


where the last inequality results from the Lipschitz bound on R and (90). 

In the end, in (92), we have a bound on a quantity in terms of its analogue one level 
lower. We can thus iterate the process, finding 


E 


EK EK 

It k +1 


< 2" (£+1) E 


So So 

rti K 0 


+ K V 2”-'(AC + Atf). 


m =0 


We now specify p = 2 and note that for < 1, 

2 " 


E 

For p — 1, we have 

And finally, for 2 < p < p max , 

As such, we define 


pr p/ 

W+i 11 k 


< KAti(e+i). 


E 


E 


pr+i _ pt+i 

rt n +1 Wi 


pr+i _ p^+i 

- r ''n+l - n 'n 


< KAt 


1/2 


PI 


< KAtf. 


At 


1/2 


e r(p) : = \ Ate |log Ate\ 
Ate 


p = 1 
p = 2 
p > 2 


so that upon plugging into (89), we have 


E [||“»« - “hT] <Ke t (p). 


(93) 


(94) 


(95) 


(96) 


(97) 


(98) 


Notice that this bound also applies when a is replaced by /3, since they satisfy all the 
same assumptions - this will be used in bounding II and IV. For III, we have 


III < Kn p ee{p) 


(99) 


B.3. Bounding II and IV. Bounding II and IV is qualitatively different from I and III 
because the arguments of the norms are martingales. Recall that the discrete Burkholder- 
Davis-Gundy inequality requires that the maximum of a martingale is bounded by its 
quadratic variation. More precisely, 


r i 



n 

P/2" 

max ||M(m )|\ p 

m<n 

< KE 


- M(k - 1)) 2 





k= 1 



( 100 ) 
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for any discrete martingale M. It follows that 
II = E 


max 

m<n 


< KE 


pi 


p/2 


E (d +I - 

’ even 

E ((fi +1 - 

: even 
n 

< a '« p/2 - 1 e e m v +i - x m in e oi Aw * +i in 

k even 

n 

< K(nAt,r/ 2 - l At e E 4M. 


/c even 


Similarly, for IV we have 

IV = E 


pi 


where we’ve used (|87|) with a —> f3. 

n 

E (dll - d + 1 )Aw'+‘ 

k even 

n 

< Kn^-'At ”/ 2 e E Old: 1 . - d +1 in 

even 

< K{7iAte) p/2 e e (p) 


where we’ve used (98) with a —> /3 to get the last line. 


( 101 ) 


( 102 ) 


B.4. Discrete Gronwall. Finally, substituting (88), (99), (101) and (102) into (83), we 
have 


^ +2 (p) < K ^ (nAt,Y 1 At, E s l(p) + (nA t,) r e e (p) 




(103) 


/c even 


Since, nAte is bounded by the simulation time T, applying the appropriate discrete 
Gronwall inequality and noting that £ = log 2 (T/ Ati) gives the desired result. Namely, 

8 e n (p) = O (e f (p)) for all n < T/Atg+i- (104) 

This completes the proof. □ 
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